## Context:
Embedding Generation: Converts text/image into a vector representation for semantic search.
Image Embedding Generation: Uses CLIP for one-shot vector generation.
Retrieval-Augmented Generation: OpenAI chat with local RAG context. Ask OpenAI questions against the contexts of specific local files
Text Generation: Generates SQL query, Create Find Request(json), or natural language response for the provided question.
Fine-Tuning: Customize a pre-trained model with your own data. The newly fine-tuned model can be used as a text generation model after successful fine-tuning. Available only on Macs with Apple silicon.

## Prerequisite
Using Miniforge, an open source package manager, is recommended.

To learn more and install Miniforge on Mac, Ubuntu, and Windows, see:
https://github.com/conda-forge/miniforge

To set up the Open Source Server:
1. Create a conda environment and install it's dependencies:
    conda env create --file=environment.yml
2. Activate the environment:
    conda activate fmosllm
3. (macOS) Install Xcode Command Line Tools for Image Embedding:
    This prevents "Failed building wheel for greenlet" error when installing the pillow-heif dependencies.
    xcode-select --install
4. (Apple Silicon Based Macs) Install mlx-lm dependencies:
    pip install mlx-lm

## Server Configuration
Check setting file and make necessary changes before running the server. The file is located at
```bash
./shared/fm_LLMOS_Settings.json
```

## Usage
```bash
python ./server/fm_LLMOS_StartServer.pyc
```

## Certificate & Keyfile
A certificate and keyfile are required to be set as environment variables to enable HTTPS connections with FileMaker Server, FileMaker Pro, and FileMaker Go.

In macOS and Ubuntu, set the environment variable locally by using:
export CERTFILE="<Directory to Certificate File>"
export KEYFILE="<Directory to Certificate File>"

In Windows, set the environment locally by running:
Powershell: $env:CERTFILE="<Directory to Certificate File>"
            $env:KEYFILE="<Directory to Certificate File>"
CMD:        set CERTFILE="<Directory to Certificate File>"
            set KEYFILE="<Directory to Certificate File>"

By default, using certificate, keyfile, and SSL are required. However, this can be disabled using the USE_SSL variable (NOT RECOMMENDED).

## Generating PKI Token 
To ensure a secure connection with FileMaker Pro or FileMaker Server and the Open Source LLM script, users will need to generate a PKI file. Set the path of the PKI file using the environment variable PKI_KEYFILE.

In macOS and Ubuntu, set the environment variable locally by running: 
export PKI_KEYFILE="<Directory to PKI Keyfile>"

In Windows, set the environment locally by running:
Powershell: $env:PKI_KEYFILE="<Directory to PKI Keyfile>"
CMD:        set PKI_KEYFILE="<Directory to PKI Keyfile>"

When VALIDATE_PKI is set to True:
Follow the directions found in the Claris Engineering Blog to generate a PKI file and JWT token: 
https://support.claris.com/s/article/Using-PKI-authentication-for-FileMaker-Server-Admin-API-calls?language=en_US

Use the following paths to locate the files mentioned in the instructions:
Windows: C:\Program Files\FileMaker\FileMaker Server\Tools\AdminAPI_PKIAuth\
macOS: /Library/FileMaker Server/Tools/AdminAPI_PKIAuth/
Linux: /opt/FileMaker/FileMaker Server/Tools/AdminAPI_PKIAuth/

Provide the JWT token (it should start with eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9) in the "API key" option in the Configure AI Account Script Step.

When VALIDATE_PKI is set to False:
The FastAPI server will not validate the PKI token, which results in a less secure environment.

## Troubleshooting:
1. AttributeError: module 'jwt.exceptions' has no attribute 'DecodeError'. Did you mean: 'JWSDecodeError'? 
Fix: Ensure that PyJWT is installed and jwt is uninstalled by using the following:
pip install jwt
pip uninstall PyJWT
pip install PyJWT

2. Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (for example, EC curves with explicit parameters).
Fix: Ensure that the PKI_KEYFILE contents corresponds with the JWT token provided in the script step. These two values should be different and the token should correspond with the file contents generated in the step "Name of public key on FileMaker Server Admin Console" of the "fmsadminapi_pki_token_example.py" python script. See "Generating PKI Token" for more information.


## Register Hugging Face token
### To review and run inference using the Gemma model and follow these instructions:
1. Review and accept the terms for Gemma by creating a Hugging Face account.
CodeGemma website: https://huggingface.co/google/codegemma-7b-it
Create a Hugging Face Account: https://huggingface.co/join

2. Create an access token using the account created by Huggingface and check the following option:
"Read acccess to contents of all public gated repos you can access"
https://huggingface.co/settings/tokens

3. Activate the conda environment created in the "Prerequisite" section:
conda activate <name>

4. Set the Hugging Face token using the following command:
python -c "from huggingface_hub.hf_api import HfFolder; HfFolder.save_token('HUGGINGFACE_TOKEN')"

## Embedding Vector Generation Models
"all-MiniLM-L12-v2"               		# 384 dimensions truncates input to 256 word pieces
"multi-qa-MiniLM-L6-cos-v1"       		# 384 dimensions truncates input to 512 word pieces

To evaluate the speed, performance, and accuracy of Sentence Transformer Models, see:
https://www.sbert.net/docs/pretrained_models.html

## Image Encoding Generation Models
OpenAI's CLIP models are supported for image embedding. Images should be less than 100 MB. The supported image formats are JPG, PNG, TIF, GIF, PSD, and BMP. Unsupported formats may cause embedding or accuracy issues.
Note: When using CLIP, the model used in a Perform Semantic Find script step should match the model used in Insert Embedding and Insert Embedding in Found Set script steps.
"clip-ViT-L-14"    				# Best performing CLIP model

Note: To perform a semantic search using multilingual text, use the “clip-ViT-B-32” model for image embedding, then use the following text embedding model in the Perform Semantic Find script step:
“sentence-transformers/clip-ViT-B-32-multilingual-v1"

## RAG Access Control
RAG spaces are private to any user by default. If access control is necessary, you can set it up in FileMaker Server Admin Console. Go to AI Services > Keys, then create an API key for each RAG user who should have access to a specific RAG space ID with the specified privileges. In FileMaker scripts, the key should be used as the API key in the Configure RAG Account script step to authorize access.

##  Text Generation Models
"google/codegemma-7b-it"          # as of July 19th 2024, is the most up-to-date version of codegemma
For more information & Gemma License Agreement: https://huggingface.co/google/codegemma-7b-it

# MLX
MLX is an array framework provided by Apple for machine learning research on Apple silicon. To improve embedding generation performance on supported devices with Apple silicon:
For Query & Text Generation on Apple Silicon Devices, users should convert the text generation model into a MLX format for faster inference using the Neural Engine.
To convert a model, set up the python environment using the instructions in the "Prerequisite" section. Then follow these instructions:

1. Activate the conda environment created in the "Prerequisite" section:
conda activate <environment name>

2. Navigate to the Open_Source_LLM folder.
cd Open_Source_LLM

3. Run the mlx.convert command to convert the model into MLX. This command also quantizes the model using 8 bit quanization and the resulting model should in "{Current Directory}/google/codegemma-7b-it"
python -m mlx_lm.convert --hf-path google/gemma-7b-it --q-bits 8 -q --mlx-path "google/codegemma-7b-it"

## Fine-Tuning
Fine-tuning is supported only on Macs with Apple silicon. Fine-tuning can be performed only on text generation models that have already been downloaded and acknowledged. Fine-tuning is resource intensive and may take anywhere from several minutes to several hours, or longer, depending on your fine-tuning parameters.

You can start fine-tuning a model in a FileMaker script or in FileMaker Server Admin Console. To check the status of a fine-tuned model, in FileMaker Server Admin Console, go to AI Services > Fine-Tuned Models.

### Fine-Tuning Training File 
You may need to specify a training file in the .jsonl format. Ensure that your training file consists of a single example per line in the following format:

{"messages": [{"role": "user", "content": "Is this an example?"}, {"role": "assistant", "content": "Yes, this is an example."}]}

In this example, "role": "user" refers to what a person would ask the model. The "role": "assistant" refers to what the model would reply with.

### Fine-Tuning System Prompt
To specify a system prompt for fine-tuning, in FileMaker Server Admin Console, go to AI Services > Model Server, then select the "System Prompt for Fine-Tuning" option. All examples will use the same system prompt.

### Using a Fine-Tuned Model for Text Generation
To use your fine-tuned model for text generation, use the name of the fine-tuned model ("fm-mlx-{custom-name}") as you would any other text generation model in a FileMaker script.

Note: Fine-tuning may not always produce better results than a pre-trained model downloaded from Hugging Face (the base model). You may need to try different combinations of fine-tuning parameters before a fine-tuned model's performance is better than a base model.

# Getting Started with the AI Model Server on CUDA-Supported Hardware
The following steps refer to the AI Model Server installed here:

macOS: /Library/FileMakerLLM/Open_Source_LLM/ 
Windows: [drive]:\ProgramData\FileMaker\Open_Source_LLM\ 
Ubuntu: /opt/FileMaker/Open_Source_LLM/

As installed, the AI Model Server doesn't support CUDA hardware for doing embedding generation or text generation. 

Running torch.cuda.is_available() may return true. However, the python server may return the following error: "RuntimeError: No GPU found. A GPU is needed for quantization."

## Preparing Your Environment to Run the AI Model Server with Support for CUDA Hardware
To check that your hardware supports CUDA, you can run "nvidia-smi" on macOS or Ubuntu or "nvidia-smi.exe" on Windows. You should see an output detailing information about your CUDA-supported hardware. You may also need to install CUDA drivers from http://www.nvidia.com/Download/Find.aspx.
 
Ensure that the AI Model Server is stopped before proceeding with the following steps. 
 
While following these steps, you may encounter "conda is not recognized as internal or external command". To fix this, you can run the following platform-specific command:

macOS: "<Model-Server-Install-Directory>/server/miniforge3/condabin/conda init"
Windows: "<Model-Server-Install-Directory>\server\miniforge3\condabin\conda.bat init"
Ubuntu: "<Model-Server-Install-Directory>/server/miniforge3/condabin/conda init"

To enable the AI Model Server to recognize and support CUDA hardware, you will need to install additional python libraries to your machine or virtual environment, as described below.
1. Enter your python environment. 
    1. If you manually installed the python libraries, you can use the command "conda activate <env-name>"
    2. If you installed the AI Model Server using FileMaker Server Admin Console, use the following command:
        1. "conda activate fmosllm"
        2. If you want to specify the full path to the conda environment, you can do so with "conda activate <Model-Server-Install-Directory>/server/miniforge3/envs/fmosllm".
2. Run the following commands to install additional dependencies:
    1. "conda install pytorch=2.1.0 torchvision=0.16.0 torchaudio=2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia"
    2. "conda install numpy==1.24.1"
    3. "pip install bitsandbytes==0.45.5"
    4. This may take some time to complete.
3. The AI Model Server should now start with CUDA-supported hardware.

## Troubleshooting
1. The AI Model Server doesn't start and you see "OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already" in your fm_LLMOS_Debug.log or fm_LLMOS_Debug.txt file.
2. How to Fix:
    1. Navigate to your environment folder.
        1. If you manually installed the python libraries, you can use "conda env list" to find where your environment is on your machine.
        2. If you installed The AI Model Server using FileMaker Server Admin Console, you can go to "<Model-Server-Install-Directory>/server/miniforge3/envs/fmosllm/Library/bin".
    2. Delete the libiomp5md.dll file from the bin folder.
    3. Restart the AI Model Server.